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Abstract 

Recent advances in molecular interrogation techniques now allow unprecedented 
genomic inference about the role of adaptive genetic divergence in wild popula- 
tions. We used high-throughput genotyping to screen a genome-wide panel of 276 
single nucleotide polymorphisms (SNPs) for the economically and culturally im- 
portant salmonid Oncorhynchus mykiss. Samples included 805 individuals from 1 1 
anadromous and resident populations from the northwestern United States and 
British Columbia, and represented two major lineages including paired popula- 
tions of each life history within single drainages of each lineage. Overall patterns 
of variation affirmed clear distinctions between lineages and in most instances, 
isolation by distance within them. Evidence for divergent selection at eight candi- 
date loci included significant landscape correlations, particularly with temperature. 
High diversity of two nonsynonymous mutations within the peptide-binding re- 
gion of the major histocompatibility complex (MHC) class II (DAB) gene provided 
signatures of balancing selection. Weak signals for potential selection between sym- 
patric resident and anadromous populations were revealed from genome scans and 
allele frequency comparisons. Our results suggest an important adaptive role for 
immune-related functions and present a large genomic resource for future studies 
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Introduction 

Inference of the structure and relatedness of natural popula- 
tions exploded in the 1960s and 1970s with the development 
of molecular genetics and a deepening of our understand- 
ing of the genetic basis driving evolutionary change (e.g., 
Lewontin 1970). Our ability to resolve closely related pop- 
ulations evolved steadily through time with improvements 
in interrogation techniques (reviewed in Schlotterer 2004; 
Seeb et al. 2011a). Inference from single nucleotide poly- 
morphisms (SNPs) during the last decade has sharpened our 
ability to observe differences among populations with addi- 
tion of data from adaptively important loci (Anderson et al. 
2005; Paschou et al. 2007; Helyar et al. 201 1). The advances in 
studying functional genetic variation through genome scans 
(Storz 2005) have proven especially rewarding for studies 



aiming at linking phenotypic variations to a genotypic back- 
ground in natural populations (Dalziel et al. 2009; Nielsen 
et al. 2009a). 

For decades, population genetics of Pacific salmonids has 
attracted substantial attention from both managers and re- 
searchers due to their economic importance as well as their 
complex biology where broadly diverse life histories have 
been described (reviewed in Utter 2004). A genetic ba- 
sis for a range of different life histories, including oceanic 
migratory patterns, has been described over the years (see 
Quinn 2005 and references therein). The Pacific salmonid 
Oncorhynchus mykiss (Fig. 1) has been extensively studied 
reflecting its charisma in both recreational fisheries and aqua- 
culture (Wishard et al. 1984; Jantz et al. 1990). Oncorhynchus 
mykiss has naturally colonized a range of habitats across 
the Beringial region from Kamchatka, Russia, in the west 
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Figure 1. Wild rainbow trout (Oncorhynchus mykiss) in their natural environment (Photo by Finn Sivebaek). 



to Mexico in the southeastern part of its native distribution 
(MacCrimmon 1971). Wild populations have furthermore 
been successfully introduced throughout the world (Mac- 
Crimmon 1971) making it a model species for investigating 
local adaptation in the wild (e.g., Rubidge and Taylor 2004; 
Narum et al. 2008; Pearse et al. 2009; Narum et al. 2010b). 
In O. mykiss, two North American lineages likely predating 
the last glacial maximum (Allendorf and Utter 1979) included 
populations along a broad coastal region of the Pacific North- 
west and distinct from those inland (also referred to as red- 
band trout) primarily east of the Cascade Range in the Upper 
Columbia and Fraser Rivers (Fig. 2; Allendorf and Utter 1979; 
Utter et al. 1980; McCusker et al. 2000). These two clades 
are hereafter referred to as the coastal and inland lineages. 
Two distinct life-history forms of O. mykiss include anadro- 
mous steelhead, having extensive oceanic migrations, and 
purely freshwater resident rainbow trout. However, generally 
higher among-region than within-region genetic variation 
for sympatric steelhead and rainbow trout supports a poly- 
phyletic nature of the assumingly derived resident life history 
(Docker and Heath 2003; Heath et al. 2008; Pearse et al. 2009) . 
Most studies to date remain inconclusive regarding potential 
molecular adaptations and selective agents maintaining this 
life-history variation (e.g., Docker and Heath 2003; Heath 
etal. 2008). 

Recent work using SNPs in nonmodel organisms has al- 
lowed increased knowledge about the role of functional ge- 
netic variation (Nielsen et al. 2009a). A large majority of 
SNPs are neutral and provide useful information about neu- 



tral evolution and demographic inference. However, SNPs 
residing within, or linked to, expressed genes such as those 
that encode for stress or immune responses, may encode alle- 
les subject to natural selection and add insight into adaptive 
evolutionary processes (Morin et al. 2004; Bouck and Vision 
2007). As key effectors of the adaptive immune system and 
displaying an unequaled level of polymorphism for coding 
genes, loci of the major histocompatibility complex (MHC) 
have received intense attention as candidates for genes un- 
der selection (e.g., reviews in Bernatchez and Landry 2003; 
Piertney and Oliver 2006). In teleost fishes alone, a 2008 
review reported that the available sequence information in- 
cluded 3559 MHC class I and class II allelic variants from 137 
species (Wegner 2008). Salmonids are particularly well suited 
for quantifying selective pressures, because of the minimalis- 
tic genetic architecture of their MHC loci: whereas other ver- 
tebrates possess multiple duplicated loci for both MHCI and 
MHCII, salmonids have just one locus with classical MHCI 
function (UBA), and one classical locus for each subunit of 
the MHCII (DAA and DAB) (Hansen et al. 1999; Landry and 
Bernatchez 200 1 ) . Nonclassical loci have been found for both 
MHCI (Dijkstra et al. 2006; Miller et al. 2006) and MHCII 
(Harstad et al. 2008), but these are highly divergent, charac- 
terized by low levels of polymorphism and are functionally 
different from the antigen-presenting classical loci. 

In salmonid fishes, classical MHC loci have been inves- 
tigated as being subject to balancing selection within and 
among natural populations (e.g., Miller et al. 2001; Aguilar 
et al. 2004) or divergent selection (Landry and Bernatchez 
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Figure 2. Map of sampling locations. An 
approximate projection of the current divide 
between the inland and coastal lineages is 
shown by a thick broken line (From Behnke 
1992). 



Grey symbols = Coastal lineage 
White symbols = Inland lineage 
Rectangles = Anadromous 
Ovals = Resident 




2001; Miller et al. 2001; Gomez-Uchida et al. 2011). The 
type of selection inferred for these genes has been shown 
to depend on the spatial scale considered with a tendency 
toward balancing selection acting at smaller regional scales 
(e.g., Miller et al. 2001). In contrast, patterns of divergent se- 
lection have often been inferred among populations at larger 
spatial scales and those inhabiting different environments 
(Bernatchez and Landry 2003). However, divergent selec- 
tion has also been found at fine spatial scales between eco- 
types of the same lake system (Gomez-Uchida et al. 2011; 
McGlauflin et al. 201 1) indicating that factors such as habitat 
type or correlated variables are important drivers of selec- 
tion at these genes. Thus, MHC markers show great poten- 
tial for understanding and disentangling complex patterns of 
adaptive processes in natural populations inhabiting varying 
habitats such as salmonids in the Pacific Northwest. 

The purpose of this study was to screen a new genome- 
wide SNP resource in O. mykiss for signatures of local adap- 



tation over large parts of its native distribution. Defining 
populations as all genetically differentiated samples, we com- 
pared diverse spawning habitats representing different envi- 
ronmental regimes including two paired steelhead and rain- 
bow trout populations within the same rivers. We will refer 
to steelhead and rainbow trout as anadromous and resident 
populations, respectively, in order to generalize our find- 
ings for other fish species exhibiting migratory life-history 
variation, including sockeye salmon (O. nerka) (Taylor et al. 
1996) and brown trout (Salmo trutta) (Elliott 1994). Us- 
ing a panel of 276 SNPs designed with the intent of spac- 
ing loci as widely as possible across the genome (including 
newly developed markers previously unscreened in natural 
populations), we find strong neutral structure between the 
two major lineages. We probed outlier loci for signatures 
of adaptation based on different habitats and life histories 
and found strong adaptive signatures for immune-related 
genes. 
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Table 1. Sample information and summary statistics. Sample size (n), expected and observed heterozygosity (Ho), allelic richness (A R ), and 
percent polymorphic SNPs are given. Statistics are given for pooled samples for locations with temporal replicates. 





Population name 


ID 


Year 


Lineage 


Life history 


n 


He 


H 0 


Ar 


Percent polymorphic SNPs 


1 


Sustut River 


SUST96 


1996 


Coastal 


Anadromous 


50 


0.20 


0.20 


1.62 


70% 




Sustut River 


SUST97 


1997 


Coastal 


Anadromous 


45 










2 


Skagit River 


SKAG 


2007 


Coastal 


Anadromous 


59 


0.25 


0.24 


1.80 


89% 


3 


Skagit River 


SKAGRES 


2009 


Coastal 


Resident 


23 


0.09 


0.09 


1.45 


55% 


4 


Green River 


GREEN 


2007 


Coastal 


Anadromous 


35 


0.22 


0.22 


1.73 


80% 


5 


Sol Due River 


SOL 


2009 


Coastal 


Anadromous 


94 


0.23 


0.23 


1.76 


92% 


6 


Chehalis River 


CHEH 


2007 


Coastal 


Anadromous 


95 


0.22 


0.21 


1.69 


82% 


7 


Deschutes River 


DESCH 


1999 


Inland 


Anadromous 


95 


0.18 


0.18 


1.60 


75% 


8 


Crooked Fork Creek 


CROOK99 


1999 


Inland 


Anadromous 


48 


0.18 


0.18 


1.59 


82% 




Crooked Fork Creek 


CROOK01 


2001 


Inland 


Anadromous 


47 










9 


Twisp River 


TWISP 


2008 


Inland 


Anadromous 


81 


0.19 


0.18 


1.65 


85% 


10 


Twisp River 


TWISPRES07 


2007 


Inland 


Resident 


25 


0.20 


0.18 


1.71 


86% 




Twisp River 


TWISPRES08 


2008 


Inland 


Resident 


13 










11 


Deadman Creek 


DEADM97 


1997 


Inland 


Anadromous 


76 


0.19 


0.18 


1.57 


64% 




Deadman Creek 


DEADM99 


1999 


Inland 


Anadromous 


19 











Materials and Methods 

Sampling 

We analyzed 823 individuals representing 15 collections (n = 
24-95) of O. mykiss from nine rivers throughout the Pacific 
Northwest of North America (Table 1; Fig. 2). Our collections 
include sampling of sympatric anadromous and resident fish 
from the Twisp River and sampling of allopatric anadro- 
mous and resident populations from the Skagit River (see 
Table 1). Temporal replicates of populations from four lo- 
cations were also analyzed (Table 1) to increase sample sizes 
and assure consistency in observed spatial structure (Waples 
1990). For Twisp River collections, all resident fish were col- 
lected further upstream compared to sampling of sympatric 
anadromous fish. Residents were typically collected during 
July, and fish were targeted visually by size (> 180 mm), ro- 
bust body shape, coloration (visible parr marks, spotting, 
bold color), or evidence of spawning (wounds, scale loss, 
fin tears, expressing milt); however residents were collected 
from many areas that were known spawning locations of steel- 
head. In contrast, the resident collection from the North Fork 
Cascade River in the upper Skagit River system (SKAGRES) 
was expected to represent an upstream resident population 
physically isolated from downstream anadromous popula- 
tions (no upstream gene flow) due to the existence of an 
approximately 30-m high waterfall. 

Molecular analyses and number 
of SNP markers 

Genomic DNA was extracted from fresh fin or opercu- 
lum tissue using QIAGEN DNeasy 96 tissue kits (Qiagen, 
Valencia, California, USA). PCR amplification and geno- 
typing was performed in 96.96 Dynamic Arrays using the 



Fluidigm IFC thermal cyclers and BioMark instruments fol- 
lowing the protocols of Seeb et al. (2009). All genotypes were 
scored automatically using the BioMark Genotyping Analy- 
sis software (Fluidigm, San Francisco, California, USA) and 
verified by two independent scorers. Any discrepancies were 
reassessed and either kept as a consensus or discarded. Fur- 
thermore, eight individuals from each 96 DNA sample plate 
(i.e., 9% of all samples) were genotyped twice for one-third 
of the SNPs on independent arrays to ensure reproducibil- 
ity of results. We screened 276 SNPs (compiled from Aguilar 
and Garza 2008; Brunelli et al. 2008; Campbell et al. 2009; 
Sanchez et al. 2009; Stephens et al. 2009; Narum et al. 2010a; 
Abadia-Cardoso et al. 2011; Hansen et al. 2011 and unpub- 
lished sources listed in Table S 1 ) , of which 1 0 did not conform 
to Hardy- Weinberg equilibrium (HWE) or were suggested to 
be in linkage disequilibrium (LD), and these were excluded 
from further statistical analyses (see results, Appendix 1 ) leav- 
ing 266 informative SNPs. Another 21 were situated in six 
pairs and three triplets within the same coding gene, and are 
consequently very tightly linked (Appendix 2; Table SI). For 
subsequent statistical analyses relating to neutral population 
structure, 12 of these, together with eight suggested outliers 
(P < 0.01), were discarded in order to assume neutrality 
and independence among a set of 246 remaining markers 
(results, Appendix 1). Genome scans and landscape genetics 
analyses, which are based on individual marker information, 
were based on 266 SNPs including all 21 SNPs in known 
linkage groups (see below, Appendix 1). 

Temporal stability of allele frequency 
distributions 

Pairwise F S t estimates among the 15 collections were gen- 
erated in Arlequin 3.5 (Excoffier and Lischer 2010) using 
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10,000 permutations with P values corrected for multiple 
tests using the sequential Bonferroni method (k = 105) (Rice 
1989). These tests revealed lower estimates between all pairs 
of temporally replicated collections (Fst = —0.002-0.011) 
compared to all spatial comparisons (f ST = 0.013-0.375). 
Assuming a conservative a value of 0.001, all temporal com- 
parisons remained nonsignificant while spatial comparisons 
were all significant. Temporal replicates were pooled to opti- 
mize sample sizes for a total of 1 1 populations in subsequent 
analyses. 

Conformance to HWE and nonrandom 
segregation of SNPs 

Conformance to HWE was tested independently for each lo- 
cus in each of the 1 1 populations using the MC algorithm 
implemented in Genepop 4.0 (Rousset 2008). P values were 
corrected using the sequential Bonferroni method (k = 3069) 
(Rice 1989). Linkage was only known for those nine groups of 
SNPs that were ascertained in single Sanger sequencing reads 
(Hansen et al. 2011; Table SI). Thus, we simply tested for 
nonrandom segregation of all pairs of loci within each pop- 
ulation using Fisher's tests for gametic LD as implemented 
in Genepop 4.0 (Rousset 2008). Due to the high number of 
tests performed (i.e., 36,315 for each population), no cor- 
rection for multiple tests was performed since this approach 
would be overly conservative and likely underestimate truly 
significant relationships. We followed a hierarchical approach 
with the following criteria for assessing LD among markers: 
(1) only SNPs with a minor allele frequency (MAF) > 0.10 
were considered due to an expectedly high number of false 
positives associated with low levels of variation, (2) only 
locus pairs showing more than 50% significant tests (P < 
0.05) with at least six performed tests among 1 1 populations, 
(3) for all locus pairs showing significant LD, SNPs potentially 
involved in multiple pairs were discarded. 

Summary statistics 

Individual global F S t values were estimated for each locus 
in Genepop 4.0 (Rousset 2008) as well as over all loci. Mean 
expected (H E ) and observed (H 0 ) heterozygosity were cal- 
culated for each locus and population using GenAlEx 6.4 
(Peakall and Smouse 2006). Allelic richness (A R ), a mea- 
sure of the number of alleles corrected for minimum sample 
size, was calculated for all populations using FSTAT v2.9.4 
(Goudet 1995). GenAlEx 6.4 was used to report the percent- 
age of polymorphic loci in each population. 

Spatial population structure and diversity 

We used 246 neutrally behaving and individually segregating 
SNPs to recalculate pairwise Pst estimates among all 1 1 pop- 
ulations in Arlequin 3.5 (Excoffier and Lischer 2010) using 
10,000 permutations. We then used pairwise _F S t values (all 



P < 0.001) to generate a multi dimensional scaling (MDS) 
plot in ViSta 5.6.3 (Young 1996) for visualizing neutral pop- 
ulation structure. 

Signatures of selection 

To detect genomic regions under selection, we used a total of 
266 SNPs including 21 SNPs from known groups of tightly 
linked SNPs. Inclusion of known linked SNPs is expected to 
increase the chance of finding signatures of selection, as even 
closely linked SNPs may differ substantially in their level of 
differentiation (Gomez-Uchida et al. 2011). To test for po- 
tential related bias, analyses were repeated with a reduced 
dataset not including linked SNPs. First, we performed a 
global genome scan for nine populations (i.e., omitting the 
two resident populations) using the model by Excoffier et al. 
(2009) as implemented in Arlequin 3.5 (Excoffier and Lischer 
2010). This approach simulates a neutral distribution of F S t 
(or _Fct) in relation to observed heterozygosity. Observed 
values by locus were then projected onto this distribution, 
and loci lying above or below the simulated 99% confidence 
threshold for neutral variation were considered as candidates 
for divergent or balancing selection, respectively. We applied 
the hierarchical test by grouping populations into two groups 
representing the coastal and inland lineages (Table 1). We as- 
sumed a model of 10 simulated groups with 100 demes and 
performed 100,000 simulations. To further understand the 
spatial pattern of potential selection for all outliers (P < 0.0 1 ) 
detected for both Fst (i.e., among all populations) and Fct 
(i.e., between lineages here), we plotted major allele frequen- 
cies over all populations. We also performed a genome scan 
over all anadromous populations using BayeScan 1.0 (Foil 
and Gaggiotti 2008). We ran 10 pilot runs of 5000 iterations 
with an additional burn-in of 50,000 iterations and a thinning 
interval of 50 followed by a final sample size of 10,000. Re- 
sults from the two genome scan approaches were compared 
and dual outliers were considered as strong candidates for 
diversifying selection. 

To investigate divergent selection between migratory life- 
history types, we performed individual genome scans for 
each of the two within-river anadromous and resident pop- 
ulation pairs (i.e., SKAG and SKAGRES as well as TWISP 
and TWISPRES) using 10 simulated demes and 100,000 sim- 
ulations in Arlequin 3.5. Again, we plotted major allele fre- 
quencies over all populations for all outliers above the 99% 
confidence levels. BayeScan 1.0 was not considered here, since 
it is expected to perform poorly with few samples (Foil and 
Gaggiotti 2008). 

Environmental effects on adaptive variation 

We further tested for associations between landscape vari- 
ables and allelic distributions for each SNP to reinforce 
evidence of natural selection acting on outlier loci (as op- 
posed to false positives) (e.g., Fraser et al. 2011). Underlying 
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correlations between allele frequencies and landscape param- 
eters may occur by chance due to either isolation by distance 
or to more similar landscapes between neighbor populations. 
If not taken into account, such neutral background noise is 
expected to lead to an increased false-positive rate (Coop 
et al. 2010). We applied the Bayesian linear model imple- 
mented in the software Bayenv (Coop et al. 2010) to correct 
this. This method uses a covariance matrix based on neutral 
markers to filter out signals from neutral population struc- 
ture while testing for significant relationships between land- 
scape variables and locus-specific allele distributions. Results 
are given for 266 SNPs (as described above), and each land- 
scape variable as a Bayes factor (BF). This BF reflects the 
ratio of the posterior support given to a model where the 
landscape variable has a significant effect on allele distribu- 
tions over an alternative model where there is no effect on the 
SNP. First, we estimated a covariance matrix using the 246 in- 
dependently and neutrally behaving SNPs (see above). Then, 
we tested for correlations between each of 266 SNPs and the 
following variables: (1) precipitation, (2) maximum temper- 
ature, (3) minimum temperature, (4) elevation, (5) latitude, 
and (6) longitude (Appendix 3). We used multiple indepen- 
dent Markov chain Monte Carlo (MCMC) runs with chain 
lengths of 100,000 iterations to ensure convergence of the 
model. 

Genetic variation at MHC genes 

Some of the loci that we used were previously annotated and 
represent potential candidate genomic regions for selection 
(Table SI). In this study, we restrict our a priori focus to six 
newly developed markers residing within the classical MHC 
class I (Omy_UBA3a, Omy_UBA3b, and Omy_UBA2a) and 
the classical MHC class II (Omy_DAB-431, Omy_DABb, and 
Omy_DABc) genes (Hansen et al. 2011; Table SI). We ob- 
served intriguingly high diversity at two of the MHC class 
II SNPs (Omy_DAB-431, Omy_DABb) known to be nonsyn- 
onymous (Hansen et al. 20 1 1 ) . To test for balancing selection 
on this gene, we reconstructed most likely haplotypes from 
the three SNPs in known linkage within the MHC class II gene 
(Table SI) using the program PHASE V2.1 by Stephens et al. 
(2001). We then used the Ewens-Watterson homozygosity 
test as implemented in Arlequin 3.5 (Excoffier and Lischer 
2010) to test for balancing selection on reconstructed hap- 
lotypes within populations and overall assuming an infinite 
allele mutation model and using 10,000 simulations. This 
test compares the expected HW homozygosity based on ob- 
served haplotype frequencies (here designated as Observed F 
value) with a simulated value (Expected F value) expected at 
mutation drift equilibrium for a gene with a similar number 
of alleles (Ewens 1972; Watterson 1978), and where balanc- 
ing selection will lead to smaller observed than expected F 
values. 



Results 

Laboratory analyses and tests for HWE 
and LD 

We excluded 18 individuals with missing data at more than 
50% of the loci, which likely reflected poor DNA quality. Six 
of the 276 SNPs that we initially screened showed signifi- 
cant deviation from HWE in five or more populations after 
correcting for multiple tests and were excluded from further 
analyses (Appendix 1; Table SI). We observed seven pairs of 
loci showing significant LD (P < 0.05) in more than half of 
the performed tests leading to the exclusion of four loci, of 
which some were involved in multiple pairs, to avoid poten- 
tial pseudoreplication by including markers in LD ( Table SI). 
Further analyses were based on a final dataset of 805 individ- 
uals representing 11 populations (n = 23-95) and 266 SNPs 
(Appendix 1). 

Summary statistics 

Over all populations, these 266 SNPs showed varying levels of 
differentiation with global locus-specific f ST values ranging 
from 0.00 to 0.68. The frequency of polymorphic loci varied 
from 55 to 92% among populations (Table 1). We observe 
intermediate levels of genetic diversity ranging from 0.09 to 
0.25 (H E ), 0.09 to 0.24 (H 0 ), and 1.45 to 1 .80 (A R ) with highly 
reduced levels in the SKAGRES population (Table 1). 

Spatial population structure and diversity 

Spatial population structure inferred from significant pair- 
wise Fst values supported a pattern where most of the varia- 
tion is likely caused by genetic drift and limited gene flow from 
historical isolation of the two lineages (Fig. 3). However, be- 
cause most of the variation plotted for DIM 1 in the MDS plot 
was driven by the isolated SKAGRES population (Fig. 3A), 
omitting this population improved resolution for inferring 
spatial structure among remaining populations (Fig. 3B). 
The five coastal lineage populations cluster according to ge- 
ography where the distant SUST population separates from 
a Puget Sound group (SKAG and GREEN) and a coastal 
Washington group (SOL and CHEH). Observed population 
structure within the inland lineage reflects contemporary ge- 
ographic isolation with clear differentiation of the DEADM 
population (Canada) from remaining populations within the 
Columbia River drainage (Fig. 3B). 

Signatures of selection 

Candidates for balancing selection could not be distinguished 
from loci with observed F S t values of zero, and we did not 
consider these further. The global genome scan assuming a 
hierarchical island model in Arlequin 3.5 revealed eight sig- 
nificant outliers for divergent selection (P < 0.01) at the -Fst 
level (Fig. 4A) of which five loci were also candidates at the 
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SUST 
□ 




■ GREEN 




■ SKAG 




□ SOL 

CHEH ■ 




DEAOM 
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O TWISPRES 


O SKAGRES 


DESCHD o CROOK 
□ 
TWISP 



2 

c 



DEADM 

□ 



GREEN 
■ 

■ SKAG 



□ OCSCH 

□ TWISP 
q □ CROOK 



-1 0 
DIM 1 (71%) 



-1 0 
DIM 1 (81%) 



Figure 3. Multi dimensional scaling (MDS) plot showing: (a) spatial population structure for all populations including the two resident populations, 
and (b) a similar plot without the SKAGRES population. Population symbols follow Figure 2. 



Fqt level (Fig. 4B). This finding is more than twice as many 
as expected by chance alone (1% of 266 = 2.7). BayeScan de- 
tected four of the eight global outliers (P < 0.01) found with 
Arlequin as well as three new outliers (Table SI). These three 
outliers only detected by BayeScan were all characterized by 
very low minor allele frequencies (0.00 1-0.0 12), leaving them 
essentially uninformative. These were not considered, and 
we only interpret the eight candidates detected by Arlequin 
further. Functional roles could be inferred for two of the out- 
liers, Omy_ILlb-163 and Omy_ndk-152, which reside within 
interleukin and nucleoside diphosphate kinase genes, respec- 
tively (Table SI; references therein). As expected, the five F C t 
outliers reflect substantial differences between the two lin- 
eages (Fig. 5A), while all three outliers only detected at the 
Fst level suggest a pattern of divergent selection within the 
coastal lineage as observed from deviating allele frequencies 
in the northern SUST compared to the other coastal lineage 
populations (Fig. 5B). 

Outliers under potential divergent selection (P < 0.01) 
were also detected in local genome scans for selection between 
within river populations exhibiting alternate life histories. We 
observe one outlier in the allopatric Skagit River and six in the 
sympatric Twisp River comparisons (Fig. 4C and 4D). Allele 



frequency plots for these local outliers all reveal a potential 
effect of anadromy. For example, allele frequencies for the 
anadromous Twisp River population are generally more sim- 
ilar to other inland anadromous populations compared to the 
sympatric resident population (Fig. 6A). For the outlier de- 
tected from the Skagit river populations, this pattern is even 
more pronounced (Fig. 6B). When also considering outliers at 
the 95% level, SNPs within interleukin genes (Omy_IL17-185 
and Omy_IL6-320) are observed as outliers in the Skagit 
and Twisp River comparisons, respectively (Fig. 4C and 4D; 
Table SI). Another marker (Omy_97954-618) appears as an 
outlier for both locations at the 95% level suggestive of a 
consistent pattern of diversifying selection (Table SI). 

Environmental correlates 

Bayesian inference for correlation between locus-specific al- 
lele distributions and landscape variables showed that 12 of 
17 (71%) global candidates for divergent selection (P < 0.05; 
Table SI) significantly correlated with one or more variables 
(Table 2), contrasted with only 1 1 of 246 neutrally behaving 
loci (4%). When only considering "decisive" relationships 
(i.e., loglO (BF) > 2), only outlier loci correlated with any of 
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Figure 4. Outlier tests for identifying signatures of selection, (a) FsT-based global test assuming hierarchical structure by grouping all anadromous 
populations within each lineage into two major groups, (b) Fa-based global test assuming hierarchical structure as in (a). Local outlier tests for the 
Skagit River (c) and Twisp River (d) anadromous and resident population pairs are also shown. All outliers above the 99% confidence threshold are 
labeled including two interleukin genes at the 95% threshold (c and d) shown in italic. Plotted heterozygosity values are scaled by estimates of within 
population heterozygosity (ho) and locus specific Fst as: (Hi = hn/[1 - ^st]) as described in Excoffier et al. (2009). 



the variables (Table 2). Particularly precipitation and temper- 
ature appear promising for explaining patterns of divergent 
selection at some of the candidate loci or linked genomic 
regions found here. 

Genetic variation at MHC genes 

One SNP within the MHC class I gene (Omy_UBA2a) was 
discarded due to significant deviation from HWE (Table SI). 
Remaining MHC markers conformed to neutrality in genome 
scans coupled with low diversity in three of five markers 
(Appendix 4). However, the two nonsynonymous mutations 
residing within the MHC class II gene (Omy_DAB-431 and 
Omy_DABb) exhibit high levels of variation throughout most 
populations (Appendix 4). Reconstructed haplo types based 
on three SNPs within the MHC class II gene revealed a signifi- 
cant deviation from neutrality toward balancing selection for 
two of 1 1 populations (Table 3). Furthermore, three popula- 
tions had P values below 0.10 and all populations, except the 



Skagit River resident (SKAGRES), had smaller than expected 
F values pointing toward balancing selection (Table 3). 

Discussion 

We found highly significant spatial structure with increased 
levels of neutral differentiation between and within the two 
major lineages. This result is consistent with previous stud- 
ies on O. mykiss from this region using allozymes and 
mtDNA (e.g., Allendorf and Utter 1979; McCusker et al. 
2000). Applying multiple independent analytical steps (e.g., 
genome scans, landscape genomics, and raw allele frequency 
plots), the accumulated evidence supports local adapta- 
tion at several genomic regions including immune response 
genes. 

Spatial population structure and diversity 

Most of the presumed neutral genetic variation that we ob- 
served can be explained by a model of historical vicariance. 
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Figure 5. Frequency plots of major allele frequencies for loci detected as outliers (P < 0.01) in the global genome scan including nine anadromous 
populations, (a) Allele frequencies for five outliers detected at both the Fst and Fct level, (b) Allele frequencies for three outliers only detected at the 
Fst level. 



Distinct evolutionary lineages have seemingly accumulated 
genetic differentiation through genetic drift over glacial peri- 
ods. Contemporary gene flow and drift appear less important 
at this large spatial scale but probably play a greater role at 
smaller regional scales among more recently diverged popula- 
tions. This observation is supported by previous phylogenetic 
observations (Bagley and Gall 1998; McCusker et al. 2000) 
and early allozymes studies (Utter etal. 1980) showing similar 
patterns of strong differentiation between inland and coastal 
lineages compared to population structure within lineages. 
The location above a waterfall of the SKAGRES population 
likely explains its divergence (e.g., Fig. 3A) and low genetic 
diversity (Table 1) as a reflection of limited gene flow and 
low effective population size (N e ) with consequently strong 
genetic drift. A similar scenario has been shown for another 
physically isolated population of resident O. mykiss (Pearse 
et al. 2009; Martinez et al. 20 1 1 ). Omitting the SKAGRES pop- 
ulation revealed further regional structure within the coastal 



lineage, suggesting more recent population histories poten- 
tially coupled with higher contemporary gene flow among 
populations within the Puget Sound and western Washing- 
ton coastal regions, respectively (Fig. 3B). Despite the large 
spatial scale, these observations hold great promise for ap- 
plying this SNP panel in future studies focusing on smaller 
geographic scales. Variation between the geographically dis- 
tant populations from Sustut River and Deadman Creek in 
BC, Canada (Fig. 3B) is also apparent. The increased dif- 
ferentiation observed for the Canadian populations within 
respective lineages is likely a reflection of the substantial geo- 
graphical separation coupled with longer divergence from the 
other populations. It is noteworthy how weak this latter signal 
is compared to that observed between lineages, suggesting a 
limited reflection of contemporary genetic drift compared to 
signals from historical separation. 

Despite potential gene flow between the sympatric anadro- 
mous and resident Twisp River populations (see also Christie 
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Figure 6. Frequency plots of major allele frequencies for loci detected as outliers (P < 0.01) between sympatric and resident population pairs, (a) 
Allele frequencies for six markers detected as outliers between the Twisp River populations, (b) Allele frequencies for one outlier detected between 
the populations within the Skagit River. Arrows denote populations included in the local genome scans. 



et al. 201 1), our results suggest some level of neutral popula- 
tion structure (pairwise Fst = 0.01; P < 0.001). Zimmerman 
and Reeves (2000) found evidence for reproductive isolation 
of sympatric steelhead and rainbow trout in the Deschutes 
River, Oregon, and explained this with variation in timing 
and location of spawning activities. A similar scenario with 
some level of spatio-temporal reproductive isolation of the 
two Twisp River populations would be in accordance with 
our observations. 

Signatures of spatially divergent selection 

We detected eight candidates for directional selection among 
all populations (Fig. 4A). Five of these were in accordance 
with divergent selection between the coastal and inland lin- 
eages (Figs. 4B and 5A). Allele frequency plots (Fig. 5A) re- 
veal high levels of information from the five F C t outliers for 
distinguishing between the two lineages. Population history 



and dynamics of populations spawning within or in close 
proximity to the transition zone has been difficult to infer 
in previous studies applying fewer and assumingly neutral 
markers (Currens et al. 2009; Blankenship et al. 2011). Out- 
liers observed in our study appear promising for future inves- 
tigations of the nature (e.g., distinguishing neutral and adap- 
tive genetic variation) and extent of this transition zone in 
O. mykiss. 

Two outliers were known to reside within an interleukin 
gene (Omy_ILlb-163) and a nucleoside diphosphate kinase 
(Omy_ndk-152) gene (Table SI). A recent study also found 
Omy_ndk-152 and another SNP within an interleukin gene 
to be affected by selection in relation to anadromy at a much 
finer scale among populations within the Klickitat River (all 
derived from the inland lineage) draining into the Columbia 
River system (Narum et al. 2011). Our broad spatial rep- 
resentation of anadromous populations limits the ability to 
identify the geographic scale at which selection acts upon 
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Table 2. Results from Bayesian inference of locus-specific landscape 
correlations. Gray cells denote a locus-parameter relationship with a 
log 1 0 (BF) between 1 .3 and 2.0, which can be interpreted as a P-value 
between 0.01 and 0.05. Black cells represent decisive relationships with 
log 10 (BF) > 2.0 or equivalent P-values below 0.01. Here, global out- 
liers include loci at the 5% significance level (see Table S1, Supporting 
information). 



Tested variables* 



SNP 

OM500180 

OMS00118 

Omy_star-206 

Omy_121713-115 

OMS00013 

Omy_1 0703 1-704 

Omy_1 12301-202 

OmyJLlb-163 

Omy_ndk-152 

Omy_1 08007-1 93 

OM500103 

OMS00081 

Omy1004 

OMS00081 

OMS00103 

Omy_111005-159 

Omy_DABb 

Omy_u09-56-073 

Omy_hsp47-86 

Omy_gluR-79 

OMS00053 

Omy_rapd-167 

Omy_09AAD-076 



Selection Precip Tmax Tmin Elev Lat Long 
Outlier 



Outlier 
Outlier 
Outlier 
Outlier 



Outlier 
Outlier 
Outlier 
Outlier 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 
Neutral 




'Precip = annual mean precipitation (mm); Tmax = annual mean max- 
imum temperature (°C); Tmin = annual mean minimum temperature 
(°C); Elev = elevation (m); Lat = latitude; Long = longitude. 



outlier loci. However, by applying a hierarchical island model 
and comparing outliers at the f S t and F C j levels, we can de- 
duce whether divergent selective forces are likely to dominate 
within or between the two lineages (Fig. 5). A recent study 
by Meier et al. (2011) showed that both number and types 
of outlier loci for divergent selection varied substantially at 
different spatial scales in brown trout. This pattern demon- 
strates the need for denser sampling of populations if the 
goal is to increase the spatial resolution of inferred selective 
processes. The observed outliers might therefore be shaped 
by heterogeneous landscapes, or other selective agents, oper- 
ating at smaller geographic scales within each lineage (e.g., 
Narum et al. 2008; Narum et al. 2010b). Indeed, our land- 
scape genomics analysis suggested an important link between 
landscape variables and several loci (Table 2). Due to the in- 
herent uncertainty of correlations between predefined vari- 
ables such as used in this study, we refrain from concluding 
direct functional relationships for specific loci or landscape 
parameters (see also Bierne et al. 201 1). Nevertheless, looking 
at overall trends two main findings can be inferred from this 
analysis. First, genetic variation associated with surrounding 
landscape variables was dominated by outlier loci suggesting 



Table 3. For each population, number of reconstructed haplotypes, 
observed, and expected levels of homozygosity (F value) are given with 
results from the Ewens-Watterson homozygosity test for deviation from 
neutrality at an MHC class II gene (see text for more details). P-values 
below 0.05 are highlighted in bold, and P-values between 0.05 and 0.10 
are shown in italic. 





No. of 


Observed 


Expected 




Population 


haplotypes 


F value 


F value 


P-value 


CHEH 




0 289 


0 467 


0 1 04 


CROOK 


6 


0.295 


0.468 


0.113 


DEADM 


5 


0.317 


0.527 


0.076 


DESCH 


5 


0.255 


0.530 


0.009 


GREEN 


5 


0.328 


0.463 


0.187 


SKAG 


5 


0.313 


0.500 


0.096 


SKAGRES 


2 


0.841 


0.777 


0.569 


SOL 


6 


0.277 


0.467 


0.079 


SUST 


4 


0.500 


0.606 


0.363 


TWISP 


6 


0.248 


0.457 


0.034 


TWISPRES 


6 


0.312 


0.409 


0.274 


Mean 


5.1 


0.361 


0.516 


0.173 


SD 


1.2 


0.173 


0.101 


0.168 



a general pattern of local adaptation to specific environments 
by O. mykiss. Second, precipitation and temperature (or cor- 
related factors), in particular, may play important roles in 
shaping adaptive genetic variation in O. mykiss. An effect of 
temperature would be in accordance with three regional (Pst) 
outliers following a latitudinal trend within the coastal lineage 
(Fig. 5B). Arecent study by Wenger etal. (2011) also suggests 
an important role of temperature and flow regime (expected 
to be partly correlated with precipitation) in determining the 
distribution of suitable habitat for O. mykiss, adding support 
for crucial adaptive roles of these environmental parameters. 
However, these association-based findings remain indirect 
in nature, but direct links between variations in genotype, 
phenotype, and fitness remain very rare for any organism. 
Future studies obtaining much denser genomic coverage (see 
e.g., Hohenlohe et al. 2010) will allow a more direct chro- 
mosomal location of the gene(s) actually under selection. 
Alternatively, surveillance of fully controlled populations al- 
lows to track gene frequencies over time after being exposed 
to a new environment (e.g., Barrett et al. 2008). Thus, our 
results can be seen as hypothesis generating for future studies 
specifically investigating effects of certain landscape variables 
or candidate genes. 

Migratory life-history types 

We identified one outlier potentially under divergent se- 
lection between Skagit River resident and anadromous 
populations (Fig. 4C). For the Twisp River resident and 
anadromous populations, we found six putative outliers for 
divergent selection (Fig. 4D). The one outlier observed be- 
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tween the two Skagit River populations is consistent with 
that expected by chance alone. Furthermore, true outliers 
can be difficult to distinguish from false positives in this 
comparison considering the high levels of observed neutral 
differentiation between these two populations (pairwise F S j 
= 0.30, P < 0.001). However, the observation of six out- 
lier loci (i.e., 2.3%) at the 99% confidence level between the 
Twisp populations, together with another marker showing 
signatures of selection in both locations, suggest ongoing se- 
lection between anadromous and resident life-history types. 
This overall result is consistent with recent studies identify- 
ing signatures of divergent selection between different mi- 
gratory variants in O. mykiss (Martinez et al. 2011; Narum 
et al. 201 1). All outlier loci show allelic patterns suggestive of 
distinctions between resident populations and the anadro- 
mous counterparts within the same lineages. For example, 
allele frequency plots reveal a consistent pattern of higher 
similarity among all inland anadromous populations than 
between the anadromous and resident Twisp River popula- 
tions (Fig. 6A). These differences are small and at best weak 
indicators of ongoing selection between life histories. How- 
ever, a mere effect of increased drift in an assumingly smaller 
and more isolated resident TWISPRES population cannot 
explain these observations since a general trend of increased 
variation was observed for this population (Table 1; Fig. 6A). 
Despite this generally inconclusive pattern, these outliers 
may potentially prove rewarding in future studies with a 
more targeted focus on studying selection between these life 
histories. 

Evidence of selection acting on immune 
response genes 

The two known nonsynonymous mutations in the peptide- 
binding region of a MHC class II gene (Omy_DAB_431 and 
Omy_DABb) generally showed high levels of diversity in 
most populations with MAF ranging between 0.26 and 0.48 
(Appendix 4). Although only two populations gave signifi- 
cant results in direct tests for balancing selection acting on 
reconstructed haplotypes of the MHC class II gene, a clear 
overall trend toward balancing selection was revealed (Ta- 
ble 3). Lack of more significant findings maybe due to limited 
statistical power from the limited number of alleles (Table 3) 
when reconstructing haplotypes from only three segregating 
SNPs. However, since these mutations change amino acids 
in the crucial peptide-binding region of the MHC class II 
molecules, we would expect that lack of balancing selection 
would have led to elimination of otherwise assumingly dele- 
terious mutations in just a few generations. While our results 
are only indicative of balancing selection within or among 
populations, many previous studies have detected patterns of 
balancing selection acting on MHC loci in other salmonid 
fishes (Landry and Bernatchez 2001; Miller et al. 2001), in- 



cluding O. mykiss (Aguilar and Garza 2006). Furthermore, a 
recent study by Martinez et al. (201 1) detected divergent se- 
lection on a microsatellite locus linked to a MHC class II gene 
between steelhead and an upstream isolated resident popu- 
lation of O. mykiss. Although not discussed by the authors, 
a general trend of reduced genetic diversity was observed 
in the landlocked resident population; however, this pop- 
ulation exhibited increased levels of diversity at the MHC- 
linked marker in accordance with balancing selection within 
the resident population. More convincing conclusions about 
balancing selection can be obtained from analyses based on 
sequencing larger fragments of genes covering multiple poly- 
morphic sites. McCairns et al. (2011) followed this approach 
for a fragment of the peptide-binding region of a MHC class 
II gene in stickleback ( Gasterosteus aculeatus) and found simi- 
lar evidence for balancing selection. Sequence-based analyses 
are in general expected to be more powerful for detecting bal- 
ancing selection compared to individual marker based outlier 
tests (Renaut et al. 2010; Brieuc and Naish 2011; Narum and 
Hess 2011). 

We also found interleukin genes among outliers in all three 
genome scans (Fig. 4) . A recent study by Narum et al. (20 1 1 ) 
also found interesting patterns for these three loci. They 
found Omy_IL-320 to be a candidate locus for anadromy 
in O. mykiss populations from the Klickitat basin in the 
Columbia tributary, Washington. This result is in agreement 
with our observations at this locus showing a signature of 
divergent selection between the resident and anadromous 
populations in the Twisp River (Fig. 4D). Furthermore, we 
observed outlier patterns for two other interleukin markers 
Omy_ILlb-163 and OmyIL17-185 (Fig. 4B and 4C). These 
two markers were observed to correlate with one or more 
environmental variables in the study by Narum et al. (20 1 1 ) , 
indicative of adaptive roles. For example, Narum et al. (201 1) 
found the Omy_ILlb-163 locus to correlate with temper- 
ature, and this finding is also supported here at a larger 
spatial scale (Table 2). Despite the different spatial scales, 
our results together with the study by Narum et al. (2011) 
add strong support for an important adaptive role of inter- 
leukin genes in O. mykiss. Temperature tolerance, or fac- 
tors correlating with temperature such as parasite abun- 
dance and virulence (e.g., Marcogliese 2008), have also been 
shown to infer selection on immune genes in other fish and 
animals in general (e.g., Kurtz et al. 2004; Sommer 2005; 
McCairns et al. 2011). 

In conclusion, we observed interesting patterns of adaptive 
variation at both interleukin genes (divergent selection) and 
a MHC class II gene (balancing selection). Here, the latter is 
represented by three SNPs hitherto unscreened in wild popu- 
lations. These candidate genes will inevitably prove valuable 
in future studies of O. mykiss investigating the evolutionary 
role of immune response processes. 
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The promise of applying functional genetic 
variation in conservation genomics 

Genome scans including functional genetic variation have 
proven very promising for identifying (and understanding) 
adaptively important genes and traits in nonmodel organ- 
isms (e.g., Namroud et al. 2008; Nielsen et al. 2009b; Glover 
et al. 2010), also see Vasemagi and Primmer (2005) and 
Storz (2005) for reviews. First, identification of intraspecific 
adaptive variation among populations is crucial for identify- 
ing focal intraspecific population units of high conservation 
value. Further, identification of highly discriminatory loci 
will greatly increase power for use in management related 
assignment tests (e.g., Freamo et al. 2011) or mixed-stock 
analyses (Freamo et al. 2011; Seeb et al. 2011b) of natu- 
ral populations. Future studies identifying adaptive variation 
are thus expected to contribute toward development of more 
effective conservation plans at the intraspecific level of wild 
nonmodel organisms. 
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Appendix 1. SNP exclusion pipeline. Ellipses isolate numbers of SNPs in the various categories. Solid arrows connect the stages of analysis, and 
broken arrows identify sets of SNPs that were excluded at each stage. 
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Appendix 3. Tested environmental variables and data sources. 



Environmental variable 


SUST 1 


SKAG 


GREEN 


SOL 


CHEH 


DESCH 


CROOK 


TWISP 


DEADM 1 


Precipitation (mm) 2 


965 


873 


1082 


2501 


1413 


279 


979 


395 


263 


Maximum temperature (°C) 3 


21.0 


22.7 


24.6 


20.0 


24.8 


32.2 


28.0 


29.5 


26.2 


Minimum temperature (°C) 4 


-13.3 


0.9 


1.7 


1.9 


1.0 


-3.5 


-8.7 


-9.6 


-9.2 


Elevation (m) 5 


1352 


9 


22 


9 


28 


397 


1049 


494 


336 


Latitude 


56.58 


48.44 


47.29 


47.91 


46.80 


44.82 


46.51 


48.37 


50.74 


Longitude 


-126.45 


-122.34 


-122.17 


-124.54 


-123.17 


-121.09 


-114.68 


-120.14 


-120.92 



Sources: 

1 Precipitation and temperature data for the BC samples obtained from http://www.genetics.forestry.ubc.ca/cfcg/climate-models.html 

2 http://www.prism.oregonstate.edu/products/matrix.phtml?vartype=ppt&view=maps. 

3 http://www.prism.oregonstate.edu/products/matrix.phtml?vartype=tmax&view=maps. 

4 http://www.prism.oregonstate.edu/products/matrix.phtml?vartype=tmin&view=maps. 

5 Obtained from Google Earth using coordinates. 
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Appendix 4. Frequency plots of major allele frequencies for five MHC-related SNPs. Omy_DAB_431 and Omy_DABb are nonsynonymous mutations 
in the peptide-binding region of a MHC class II gene, while Omy.DABc represents a synonymous mutation in the same gene. Omy_UBA3a and 
OmyJJBA3b show allele frequencies for two SNPs residing within a MHC class I gene. 
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